Mixing and Merging for Spoken Document Retrieval

نویسندگان

  • Mark Sanderson
  • Fabio Crestani
چکیده

This paper describes a number of experiments that explored the issues surrounding the retrieval of spoken documents. Two such issues were examined. First, attempting to nd the best use of speech recogniser output to produce the highest retrieval e ectiveness. Second, investigating the potential problems of retrieving from a so-called \mixed collection", i.e. one that contains documents from both a speech recognition system (producing many errors) and from hand transcription (producing presumably near perfect documents). The result of the rst part of the work found that merging the transcripts of multiple recognisers showed most promise. The investigation in the second part showed how the term weighting scheme used in a retrieval system was important in determining whether the system was a ected detrimentally when retrieving from a mixed collection.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

TREC - 7 Experiments at the University

The University of Maryland participated in three TREC-7 tasks: ad hoc retrieval, cross-language retrieval, and spoken document retrieval. The principal focus of the work was evaluation of merging techniques for cross-language text retrieval from mixed language collections. The results show that biasing the merging strategy in favor of documents in the query language can be helpful. Ad hoc and s...

متن کامل

TREC Experiments at the University of Maryland

The University of Maryland participated in three TREC tasks ad hoc retrieval cross language retrieval and spoken document retrieval The principal focus of the work was evaluation of merging techniques for cross language text retrieval from mixed language collections The results show that biasing the merging strategy in favor of documents in the query language can be helpful Ad hoc and spoken do...

متن کامل

TREC-7 Experiments at the University of Maryland

The University of Maryland participated in three TREC-7 tasks: ad hoc retrieval, cross-language retrieval, and spoken document retrieval. The principal focus of the work was evaluation of merging techniques for cross-language text retrieval from mixed language collections. The results show that biasing the merging strategy in favor of documents in the query language can be helpful. Ad hoc and s...

متن کامل

Mixing and Merging for Spoken

This paper describes a number of experiments that explored the issues surrounding the retrieval of spoken documents. Two such issues were examined. First, attempting to nd the best use of speech recogniser output to produce the highest retrieval eeectiveness. Second, investigating the potential problems of retrieving from a so-called \mixed collec-tion", i.e. one that contains documents from bo...

متن کامل

AT&T at TREC-6: SDR Track

In the spoken document retrieval track, we study how higher word-recall|recognizing many of the spoken words|aaects the retrieval eeectiveness for speech documents, given that high word-recall comes at a cost of low word-precision|recognizing many words that were not actually spoken. We hypothesize that information retrieval algorithms would beneet from a higher word-recall and are robust again...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1998